perm filename OGDENI.DOC[2,TES] blob
sn#039431 filedate 1973-04-26 generic text, type T, neo UTF8
OGDEN IMPLEMENTATION NOTES
compiled by Larry Tesler
April 26, 1973 ... Version 1
April 26, 1973 OGDEN IMPLEMENTATION
T A B L E O F C O N T E N T S
SECTION PAGE
I STRUCTURES
I.A TREE STRUCTURES . . . . . . . . . . 1
I.B COORDINATE IDENTIFICATION . . . . . . . 1
I.C COORDINATE TRANSFORMATION . . . . . . . 1
I.D GENERIC FUNCTIONS . . . . . . . . . 2
II EDITING CONSIDERATIONS
II.A CREATION OF BOXES . . . . . . . . . 3
II.B DELETION OF BOXES . . . . . . . . . 3
III REPRESENTATION OF WORDS
III.A WORDS AS ATOMS . . . . . . . . . . 4
III.B WORDS AS STRINGS . . . . . . . . . . 4
IV LARGE FILES
IV.A LIST STRUCTURED FILES . . . . . . . . 5
IV.B FRAGMENT FILES . . . . . . . . . . 5
IV.C STRUCTURE FILES . . . . . . . . . . 5
V DUMPING INTERNAL STRUCTURES
V.A DUMPER . . . . . . . . . . . . . 6
i
April 26, 1973 OGDEN IMPLEMENTATION
V.B LOADER . . . . . . . . . . . . . 6
ii
April 26, 1973 OGDEN IMPLEMENTATION
SECTION I
STRUCTURES
TREE STRUCTURES
I.A. A "structure" is a tree (for now) in which each node refers to
an atom or to a molecules. For example, a paragraph can occur in
outline structure with sentences as direct subnodes and again in
galley structure with text lines as direct subnodes.
Although the structures imposed on a document may be quite different
and overlapping, each that we have thought of can be contorted into a
special case of a tree structure. In the most degenerate case
(cross-reference structure), each cross-reference is a tree of one
leaf emanating from a root. It is possible to generalize to directed
graph structures, but this seems an unnecessary complication.
Since there are many kinds of structures, the system will have to
have a notion of type. To avoid confusion with primitive data types,
we should find another term, say, "kinds".
COORDINATE IDENTIFICATION
I.B. A node within a tree structure is best identified by a list of
the names of the atoms and molecules referenced by itself and by all
of its direct ancestors. This list provides the "coordinates" of the
node within the structure. A more complete identification of a node
for global use is this coordinate list prefixed by the identifier of
the entire structure.
COORDINATE TRANSFORMATION
I.C. A single atom or molecule may be referenced by many structures.
It must be possible to transform the coordinates of its referent node
1
April 26, 1973 OGDEN IMPLEMENTATION
in one structure to the coordinates of its referent node in any other
structure. One of the requirements of a newly invented structure is
that the transformations to all other structures be defined. Some of
these transformations may involve multiple stages. It has not yet
been proven that this is possible.
GENERIC FUNCTIONS
I.D. Editing and formatting functions are "generic" with versions
for each kind of structure.
2
April 26, 1973 OGDEN IMPLEMENTATION
SECTION II
EDITING CONSIDERATIONS
CREATION OF BOXES
II.A. From the galley position of a new box, OGDEN must compute for
each structure of the system the position at which new information
must be added because of the creation. Coordinate transformations
are used for this purpose. A structural update must also be
considered for components of the box, such as words and phrases that
are being indexed.
DELETION OF BOXES
II.B. When a box is deleted, all references to it are affected. The
generic "delete" functions for all affected structures are applied.
The coordinate transformation scheme had better take care of this
correctly.
3
April 26, 1973 OGDEN IMPLEMENTATION
SECTION III
REPRESENTATION OF WORDS
WORDS AS ATOMS
III.A. In the first version, we will represent each text word as a
LISP atom. This simplifies the implementation and gives us
properties for free. One atom will be needed for each distict word
in the document. The main disadvantage is that of the three words
that LISPX allocates for each atom, we only use at most two half-
words: PNAME and PROPERTIES. When we get to big documents, this may
be a disadvantage.
WORDS AS STRINGS
III.B. An alternative is to make each word a string; but to hang a
property on it, we will have to use two hash tables. One takes the
small integer scramble of a string and yields a list of propertied
strings known to have that scramble; if the string is going to have
properties for the first time, it is added to this list. The other
hash table takes a propertied string and a property indicator and
yields a property value.
4
April 26, 1973 OGDEN IMPLEMENTATION
SECTION IV
LARGE FILES
LIST STRUCTURED FILES
IV.A. Peter Deutsch has a package that allows cons pairs to be
stored in ascii on files, providing for car, cdr, and cons functions
operating on this data. Thus, when we run out of free storage in
core, we can start to use these files for infrequently accessed list
structures.
FRAGMENT FILES
IV.B. Each fragment can be stored on a separate file. If we have
space problems but don't want to resort to Peter's package, we can at
least keep in core only the fragment currently being edited. If we
do use Peter's scheme, we can keep the fragment in the file and only
keep in core the part needed to maintain the display list.
STRUCTURE FILES
IV.C. Even if the fragment structure is on files, it will probably
be possible to keep the other structures in core at all times.
However, if this becomes impractical, they also can be put on files.
5
April 26, 1973 OGDEN IMPLEMENTATION
SECTION V
DUMPING INTERNAL STRUCTURES
DUMPER
V.A. Assuming that structures are stored internally, dumping them on
files between edits is a problem because internally structures refer
to subfragments by absolute addresses but externally they must refer
to relative locations. Thus, a dumper will be needed to convert the
absolute locations to relative locations.
LOADER
V.B. The loader does the inverse job of the dumper. To svoid having
to search a list N long to translate relative-N to an absolute
address, there can be an array which points to every 100th element of
a fragment. Thus, no more then 100 elements must be searched.
6